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Abstract — We introduce a new analysis of an adaptive mixture 
method that combines outputs of two constituent filters running 
in parallel to model an unknown desired signal. This adaptive 
mixture is shown to achieve the mean square error (MSE) 
performance of the best constituent filter, and in some cases 
outperforms both, in the steady-state. However, the MSE analysis 
of this mixture in the steady-state and during the transient 
regions uses approximations and relies on statistical models on 
the underlying signals and systems. Hence, such an analysis 
may not be useful or valid for signals generated by various 
real life systems that show high degrees of nonstationarity, limit 
cycles and, in many cases, that are even chaotic. To this end, 
we perform the transient and the steady-state analysis of this 
adaptive mixture in a "strong" deterministic sense without any 
approximations in the derivations or statistical assumptions on 
the underlying signals such that our results are guaranteed 
to hold. In particular, we relate the time-accumulated squared 
estimation error of this adaptive mixture at any time to the 
time-accumulated squared estimation error of the optimal convex 
mixture of the constituent filters directly tuned to the underlying 
signal in an individual sequence manner. 

Index Terms — Deterministic, adaptive mixture, convexly con- 
strained, steady-state, transient. 



I. Introduction 

The problem of estimating an unknown desired signal is 
heavily investigated in the adaptive signal processing literature. 
However, in various applications, certain difficulties arise 
in the estimation process due to the lack of structural and 
statistical information about the data model that relates the 
observation process to the desired signal. To resolve this 
lack of information, mixture approaches are proposed that 
adaptively combine outputs of multiple constituent algorithms 
performing the same task |l]-|l2]- These parallel running 
algorithms can be seen as alternative hypotheses for modeling, 
which can be exploited for both performance improvement 
and robustness. Along these lines, a convexly constrained 
mixture method that combines outputs of two adaptive filters 
is introduced in fl]. In this approach, the outputs of the 
constituent algorithms are adaptively combined under a convex 
constraint to minimize the final MSE. This adaptive mixture 
is shown to be universal with respect to the input filters 
in a certain stochastic sense such that it achieves (and in 
some cases outperforms) the MSE performance of the best 
constituent filter in the mixture in the steady-state. However, 
the MSE analysis of this adaptive mixture for the steady-state 
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and during the transient regions uses approximations, e.g., 
separation assumptions, and relies on statistical models on the 
signals and systems, e.g., nonstationarity data models (l2l-||4|. 

Nevertheless, signals produced by various real life systems, 
such as in underwater acoustic communication applications, 
show high degrees of nonstationarity, limit cycles and, in many 
cases, are even chaotic so that they hardly fit to assumed sta- 
tistical models. Hence an analysis based on certain statistical 
assumptions or approximations may not useful or adequate 
under these conditions. To this end, we refrain from making 
any statistical assumptions on the underlying signals and 
present an analysis that is guaranteed to hold for any bounded 
arbitrary signal without any approximations. In particular, 
we relate the performance of this adaptive mixture to the 
performance of the optimal convex combination that is directly 
tuned to the underlying signal and outputs of the constituent 
filters in a deterministic sense. Naturally, this optimal convex 
combination can only be chosen in hindsight after observing 
the whole signal and outputs a priori (before we even start 
processing the data). In this sense, we provide both the 
transient and steady-state analysis of the adaptive mixture in a 
deterministic sense without any assumptions on the underlying 
signals or any approximations in the derivations. Our results 
are guaranteed to hold in an individual sequence manner 

After we provide a brief system description in Section |II] 
we present a deterministic analysis of the convexly constrained 
adaptive mixture method in Section|III] where the performance 
bounds are given as a theorem and a lemma. The letter 
concludes with certain remarks. 

II. Problem Description 

In this framework, we have a desired signal {y(0}t>i' 
where \y{t)\ <Y< cx), and two constituent filters running in 
parallel producing {yi{t)}t>i and {y2{t)}t>i, respectively, as 
the estimations (or predictions) of the desired signal {y(0}t>i- 
We assume that Y is known. Here, we have no restrictions on 
yi{t) or y2{t), e.g., these outputs are not required to be causal, 
however, without loss of generality, we assume |2/i(i)| < Y 
and 12/2(01 — ^' i-S-' these outputs can be clipped to the range 
[— y, Y] without sacrificing performance under the squared 
error As an example, the desired signal and outputs of the 
first stage filters can be single realizations generated under 
the framework of |0 . At each time t, the convexly constrained 
algorithm receives an input vector x{t) = [yi{t) y2{t)]'^ and 
outputs 

m - A(i)yi(t) + (1 - \im2it) - im (1 - m)Mt), 



where < A(t) < 1, as the final estimate. The final estimation 
error is given by e{t) — y{t) — y{t). The combination weight 
X{t) is trained through an auxiliary variable using a stochastic 
gradient update to minimize the squared final estimation error 
as 
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= pit) + Me(t)A(t)(l - A(t))[yi(t) - yaW], (2) 

where /i > is the learning rate. The combination parameter 
A(i) in ([Hi is constrained to lie in [A+, (1 - A+)], < A+ < 
1/2 in 111, since the update in ^ may slow down when X{t) 
is too close to the boundaries. We follow the same restriction 
and analyze ^ under this constraint. 

When applied to any sequence {2/(0}t>i' 'he algorithm of 
dU yields the total accumulated loss 



Ln{y,y) 






{y{t)-mf 



for any n. Although, we use the time-accumulated squared 
error as the performance measure, our results can be readily 
extended to the exponentially weighted accumulated squared 
error We next provide deterministic bounds on L„(y,y) 
with respect to the best convex combination min L„(y^, y), 

where 



where yp{t) - Py,{t) + (1 - P)Ut\ z = \lf^^%-X^] < 1 
and step size /.t = 2^fT^i^' provided that 
A(t) e [A+,l-A+], < A+''< 1/2, for all t during 
the adaptation. 



Lniyp.y) 



iy{t)-w{t)f 



Equation (|3]l provides the exact trade-off between the 
transient and steady-state performances of the adaptive 
mixture in a deterministic sense without any assumptions or 
approximations. From (|3]l we observe that the convergence 
rate of the right hand side is O (;^) and, as in the stochastic 
case [41, to get a tighter asymptotic bound with respect to the 
optimal convex combination of the filters, we require a smaller 
e, i.e., smaller learning rate /i, which increases the right 
hand side of (O. Although this result is well-known in the 
adaptive filtering literature and appears widely in stochastic 
contexts, however, this trade-off is guaranteed to hold in here 
without any statistical assumptions or approximations. Note 
that the optimal convex combination in (O, i.e., minimizing 
(3, depends on the entire signal and outputs of the constituent 
filters for all n. 

Proof: To prove the theorem, we use the approach introduced 
in ||6| (and later used in ||5]) based on measuring progress of 
an adaptive algorithm using certain distance measures. 

We first convert dU to a direct update on \{t) and use this 
direct update in the proof. Using e^''^*-' = wA from ([T]i, 
the update in (|2|i can be written as 



and yp{t) = Pyiit) + (1 — f3)y2{t), that holds uniformly 
in an individual sequence manner without any stochastic 
assumptions on y{t), yi{t), y2{t) or n. Note that the best 
convex combination min i„(y^,y), which we compare the 

performance against, can only be determined after observing 
the entire sequences, i.e., {y{t)},{yi{t)} and {y2{t)}, in 
advance for all n. 

III. A Deterministic Analysis 

In this section, we first relate the accumulated loss of 
the adaptive mixture to the accumulated loss of the best 
convex combination that minimizes the accumulated loss in 
the following theorem. Then, we demonstrate that one cannot 
improve the convergence rate of this upper bound using 
our methodology directly and the Kullback-Leibler (KL) 
divergence [31 as the distance measure by providing counter 
examples as a lemma. We emphasize that although the 
steady-state and transient MSE performances of the convexly 
constrained mixture algorithm are analyzed with respect to 
the constituent filters [2|-[4J, we perform the steady-state and 
transient analysis without any stochastic assumptions or use 
any approximations in the following theorem. 

Theorem: The algorithm given in (|2]l, when applied to any 
sequence {2/(0}t>i' with \y{t)\ <Y< cx), yields, for any n 
and any e > 

Ln{y,y) /2e + i\ . ( Ln{yp,y)^ .„/i 



A(i + 1) 



l-z2 
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/5e[04] 



<o 



(3) 



1 + e-Pft+i) 1 + i_AWe-f"=(t)A{i)(i-A(t))[yi(t)-i/2(t)] 

X{t) 



m 



,^e(t)\(t)(l-X{t))yi(t) 



;^(i)e^e(t)A(t)(i-A(t))iii(t) + (1 - A(i))e''«(«)^(*)(i-^W)!'2(*) ' 



(4) 



Unlike ||5] (Lemma 5.8), our update in (|4|l has, in a certain 
sense, an adaptive learning rate /iA(t)(l — A(i)) which requires 
different formulation, however, follows similar lines of [|5J in 
certain parts. 

Here, we first define y^it) = (3yi{t) + (1 - ^)j/2(t) = 

u'^x{t), where (3 e [0, 1] and m = [/? 1 - f3f. At each 
adaptation, the progress made by the algorithm towards u at 
time t is measured as d{u,w{t)) — d{u,w{t + 1)), where 

w{t) = [X{t) (1 - A(t))]^ and d{u,w) = Y^'^^^Ui\B.{ui/wi) 
is the Kullback-Leibler divergence f6\, u G [0,1] , w G 
[0, 1] . We require that this progress is at least a{y{t)—y{t))'^ — 
h{y{t) — yp{t))'^ for certain a, 6, ^ ||5l, ||6l, i.e., 

a{y{t)-y{t)f-Ky{t)-m{t)? 
< [d{u, w{t)) - d{u, w{t + 1))] 



\{t) 



1 - \{t) 



which yields the desired deterministic bound in (|3) after 
telescoping. 



Defining C(i) = e^^(*)^(*)(i-^(*», we have from Q 



pi.('-^]Hi-mJ'-'^'^' 



that 

Giyit),y,ypit)*,at)) 



y^i) inc(t) - Hmatr^'^ + (i - A(t))c(t)*^(*)). 



y{t) + Y- i^ ) In c(t) + (y(i) + r) In at) 



(6) 



Using the inequaUty a^ < l—x{l~a) for a > and x £ [0, 1] 
from £j, we have 



+ 



YH\nmf 



2b 



+ a{y{t) - y{t)f - 



{\nCit)f 



4b 



(11) 



a{y{t)-mf-{y{t)-m)^nm 

y2(lnC(i))2 
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C(i) 



*!(*) - ('/^('/^2r 



Vl(t) + Y 



<cw-^(i-^i%^(i-c(ir) 



(yW-yWfx 



a-^A(t)(l-A(t)) 



which implies in ^ 



/.2A(i)^(l-A(t))2 ^ yVA(t)'(l-A(i))2 
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(12) 



In 



(AC(t)^^(*) + (l-A)C(t)^^(*)) 



A 



<\n at)-'{i 



Xm{t) + {i-X)Ut) + Y 



-yinC(f)+ln 1 



2Y 

m + Y 

2Y 



(i-CW^^) 



(i-c(^r) , 



(7) 



For (fTZt to be negative, defining k — A(t)(l — A(t)) and 

H{k) = e^,\^ + ^)-^,k + a, 

it is sufficient to show that H{k) < for /c e [A+(l - A+), \], 
i.e., /fc e [A+(l - A+), i] when A(i) e [A+, (1 - A+)], since 



where y{t) = A(i)yi(t) + (1 - \{t))y2{t). As in ll5],^one 
can further bound (jT) using ln(l — (/(I — e^)) < pg + ^ for 
< g < 1 (originally from |6|) 



H{k) is a convex quadratic function of k, i.e.. 






> 0. 



In 



(AC(t)^^(*' + (l-A)C(i)*=^^*^) 



Hence, we require the interval where the function H{-) is 
negative should include [A+(l — A"*"), ^], i.e., the roots fci 
and fc2 (where /c2 < ^i) of H{-) should satisfy fci > i and 
fca < A+(l- A+), where 



< -y In C(t) + m) + Y) In eft) + ^ ^ '' ■ (8) 
Using (O in (|6ll yields 



fcl,2 = 



VC^ + i) 



AW 



1 - A(t) 



and 



1 ± VI - 4as 

A /y2 



(13) 



(yMO+ninCW-(yW+ninCW- 



y2(lnC(t))= 



1 
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To satisfy fci > 1/4, we straightforwardly require from (ITJi 



From now on, we omit /3 of y^(i). We observe from (|5) and 
(|9]l that to prove the theorem, it is sufficient to show that 

G{y{t),y{t),yp{t)X{t)) <0, where 

Giyit),m,Mt)Xit)) 
= -(y^(i) +y)lnC(i) + im+Y)\nat) 

+ ^'^^^^^^^' + a{y{t) y{t)f - b{y{t) - ^t)? ■ dO) 



2 + 2V1 - 4as 



>M- 



To get the tightest upper bound for (fTsT l, we set 



M 



2 + 2V1 - 4as 



i.e., the largest allowable learning rate. 

To have fcs < A+(l - A+) with ^ = 2+2Vi-4as ^ ^^.^^^ q 
we require 



since 



For fixed y{t),y{t)X{t), G{y{t),y{t),yp{t)X{t)) is 
mized when ^^ = 0, i.e., yp{t) - y(i) + ^^^^ - 

g|^ = -26 < 0, yielding yp(t)* = y{t) - ^^. Note that 
while taking the partial derivative of G{-) with respect to y/3(t) 
and finding yfi{t)* , we assume that all y{t),y{t), ({t) are fixed, 
i.e., their partial derivatives with respect to y/3(i) is zero. This 
yields an upper bound on G(-) in terms of ypit). Hence, it 
is sufficient to show that G{y{t),y{t),yi3{t)* Xit)) < such 



1 - VI - 4as 



4(1 + VI - 4as) 
Equation (fT4] l yields 

/y2 

as = a — 
V 2 

where 



< A+(l-A^ 



46/ - 4 ' 



(14) 



(15) 



A 1-4A+(1-A+) 
" 1 + 4A+(1-A+) 



and z < I after some algebra. 

To satisfy (flST l. we set 6 = ^ for any (or arbitrarily small) 
e > that results 

"-">' ,16) 



a< 



y2(2e+l)- 



_ {l-z')e 



To get the tightest bound in (|5]), we select a = yi(2e+i) ^^ 
(fT&t . Such selection of a, b and n results in (|5]l 

<,i„Ci(i±l)V(i-,«.„fl^47i^V (17) 



After telescoping, i.e., summation over t, X^tLi' *ti3 yields 



<o(i), 



l-A(l) 



2e + l\ j^jj^ j L„{yi3,y) 



1 — Z^ J ;9e[0,l] 



(18) 



which is the desired bound. 

Note that using b = y^,a= Y^^2l+i) ^"^ '* = (^ + lb 
we get 



_ 2 + 2^1 -4as _ 4e 2 + 2z 
^ " i ~ 2e + l y2 ' 

after some algebra, as in the statement of the theorem. This 
concludes the proof of the theorem. D 

In the following lemma, we show that the order of the upper 
bound using the KL divergence as the distance measure under 
the same methodology cannot be improved by presenting an 
example in which the bound on b is of the same order as that 
given in the theorem. 

Lemma: For positive real constants a, b and /i which satisfies 
© for all \y{t)\ < Y, \yi{t)\ < Y and \y2{t)\ < Y and 
A(<) e [A+, (1 - A+)], we require 
a 1 

- 4 ^ 16A+(1-A+)' 

Proof: Since the inequality in (|5]l should be satisfied for all 
possible y{t), yi{t), y2{t), (3 and A(t), the proper values of a, 
b and /z should satisfy (|5]l for any particular selection of y{t), 
yi{t), y2{t), P and X{t). First we consider y{t) = yi{t) = Y, 
y^lt) = 0, /3 = 1 and A(i) = A+ (or, similarly, y{t) = yi{t) = 
Y, 2/2 (^) = —Y and X{t) ~ A+). In this case, we have 

a{Y~X+Yf 

< - ln(A+ + (1 - A+)eA'(>--A+i-)A+(i-A+)(^y)) 

< -A+ In 1 - /.t(l - A+)2A+y(l - A+)(-r) (19) 
= n{l-X+fX+Y^, (20) 



where ( fT9] l follows from the Jensen's Inequality for concave 
function hi(-). By ( |20l i, we have 



> 



A+fl-A^ 



(21) 



For another particular case where yi{t) = Y, y{t) = 
y2(t) = 0, ;3 = 1 and A(t) = 1/2, we have 

a(-|)2-6(-r)2<-ln(i + ie-(-^H(-^)) 

1 r2 

- 2^ 8 ' 

where (|22] | also follows from the Jensen's Inequality. By 

we have 

a a 

b> - + — 

- 4 16 

a a 

> 



(22) 



(23) 



4 16A+(1-A+)' 
where (|23] | follows from (ISTT l. which finalizes the proof. D 

IV. Conclusion 

In this paper, we introduce a new and deterministic analysis 
of the convexly constrained adaptive mixture of [21 with- 
out any statistical assumptions on the underlying signals or 
any approximations in the derivations. We relate the time- 
accumulated squared estimation error of this adaptive mix- 
ture at any time to the time-accumulated squared estimation 
error of the optimal convex combination of the constituent 
filters that can only be chosen in hindsight. We refrain from 
making statistical assumptions on the underlying signals and 
our results are guaranteed to hold in an individual sequence 
manner We also demonstrate that the proof methodology 
cannot be changed directly to obtain a better bound, in the 
convergence rate, on the performance by providing counter 
examples. To this end, we provide both the transient and steady 
state analysis of this adaptive mixture in a deterministic sense 
without any assumptions on the underlying signals or without 
any approximations in the derivations. 
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